Skip to content

Fix SDPO teacher prompt alignment + dashboard visibility#35

Merged
kfallah merged 7 commits intomainfrom
fix/sdpo-teacher-prompt-alignment
Feb 24, 2026
Merged

Fix SDPO teacher prompt alignment + dashboard visibility#35
kfallah merged 7 commits intomainfrom
fix/sdpo-teacher-prompt-alignment

Conversation

@kfallah
Copy link
Owner

@kfallah kfallah commented Feb 23, 2026

Summary

  • Root cause fix: Set add_generation_prompt=True in both Tinker (_build_sample_datum) and local/Modal (_build_self_teacher_topk) teacher prompt construction — the <|im_start|>assistant\n turn delimiter was missing after the refactor to pre-tokenized data flow, causing strongly negative advantages for early response tokens
  • Dashboard visibility: Decode and emit teacher_scored_texts in distill metadata across all three backends; render per-sample teacher scored text in the feedback dashboard (old logs without the field are handled gracefully)
  • Tests: Assert add_generation_prompt=True, verify teacher_scored_texts in metadata for single/batch flows, and test dashboard rendering with and without the new field

Test plan

  • uv run ruff check claas/ tests/ --fix — all checks passed
  • uv run ty check — all checks passed
  • uv run pytest tests/ -v -m "not integration" — 160 passed, 5 deselected
  • Run integration test on Tinker backend to verify concise preference compliance recovers

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Dashboard: shows "Teacher Scored Text" inside each sample's expandable detail when available; omitted cleanly when absent.
    • Feedback flow: system prompt from completions is preserved and surfaced with samples so teacher context aligns with the displayed scoring text.
  • Bug Fixes

    • Feedback endpoint now validates presence of system prompt and returns a clear 422 error when missing.
  • Documentation

    • Updated references and license in README.

The teacher prompt was missing the `<|im_start|>assistant\n` prefix
before response tokens after the refactor to pre-tokenized data flow.
This caused strongly negative advantages for early response tokens,
regressing concise preference compliance.

- Set `add_generation_prompt=True` in both Tinker and local/Modal
  teacher prompt construction to match student prompt alignment
- Decode and emit `teacher_scored_texts` in distill metadata for
  all three backends (Tinker, local, Modal)
- Render teacher scored text per-sample in the feedback dashboard
- Add tests for prompt alignment, metadata field, and dashboard rendering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The changes extend the feedback system to propagate and display per-sample "Teacher Scored Text" throughout the pipeline. Teacher-scored text is computed during distillation, passed through batch processing in the engine, and rendered in the dashboard's expandable detail sections for each sample.

Changes

Cohort / File(s) Summary
Core Teacher-Scored Text Computation
claas/training/distillation.py, claas/training/engine/tinker/engine.py
Extended _build_self_teacher_topk to return a third value (teacher_scored_text string) in addition to logprobs and indices. Updated callers to unpack three values. Modified _build_sample_datum return type to include string values in metrics dict. Enabled generation prompt inclusion by switching add_generation_prompt to True. Batch processing now collects and propagates teacher_scored_texts into DistillResponse metadata.
Dashboard Rendering
claas/dashboard/rendering.py
Added per-sample teacher scoring text rendering in expandable feedback dashboard detail sections. Reads teacher_scored_texts from metrics_payload, guards against missing data with empty list defaults, and conditionally renders teacher section only when text exists for a sample.
Test Coverage
tests/test_api.py
Added two integration tests verifying dashboard rendering: one asserting Teacher Scored Text displays when present, another verifying graceful omission when absent. Tests mock local config, populate temporary feedback logs, and validate /v1/dashboard output.
Engine Tests
tests/test_tinker_engine.py
Updated full-flow and batch distill tests to expect "teacher_scored_texts" metadata entry with length equal to sample count. Added tokenizer mock decode method and assertions for add_generation_prompt=True in teacher prompt construction.

Sequence Diagram(s)

sequenceDiagram
    participant Distillation as Distillation Module
    participant Engine as Engine Module
    participant Rendering as Dashboard Rendering
    
    Distillation->>Distillation: _build_self_teacher_topk(prompt, feedback, response_ids)
    Distillation->>Distillation: Decode teacher inputIds + responseIds
    Distillation->>Distillation: Compute logprobs, indices, teacher_scored_text
    Distillation->>Distillation: Return (logprobs, indices, teacher_scored_text)
    
    Engine->>Engine: _build_sample_datum(sample, tokenizer, ...)
    Engine->>Distillation: Call _build_self_teacher_topk()
    Distillation-->>Engine: Return (logprobs, indices, teacher_scored_text)
    Engine->>Engine: Collect teacher_scored_texts per sample
    Engine->>Engine: Create metrics dict with teacher_scored_text
    Engine->>Engine: Build DistillResponse with teacher_scored_texts in metadata
    
    Engine->>Rendering: Provide metrics_payload with teacher_scored_texts
    Rendering->>Rendering: For each sample, read teacher_scored_text from metrics
    Rendering->>Rendering: Guard against missing text (default to empty list)
    Rendering->>Rendering: Build teacher_section HTML if text exists
    Rendering->>Rendering: Inject teacher_section into sample detail panel
    Rendering-->>Rendering: Render dashboard with Teacher Scored Text
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A teacher's words now dance on screen,
Through distillation, coded clean,
Each sample scores with textual grace,
The dashboard shows each scholar's place,
With tokens decoded, truth laid bare,
The feedback blooms with thoughtful care! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the two main changes: fixing SDPO teacher prompt alignment and adding dashboard visibility for teacher-scored text.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/sdpo-teacher-prompt-alignment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
claas/training/distillation.py (1)

171-181: ⚠️ Potential issue | 🟡 Minor

Docstring still says "Pair" after return type extended to a triple.

The Returns section should document the new third element teacher_scored_text.

📝 Proposed fix
         Returns:
-            Pair of top-k logprobs and indices for each response token.
+            Triple of (top-k logprobs, top-k indices, teacher_scored_text) for each response token.
+            teacher_scored_text is the decoded full teacher input (prompt + response) with special tokens retained.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/training/distillation.py` around lines 171 - 181, Update the docstring
for the function that "Build top-k teacher logits from the frozen base model"
(the function with the current Returns documenting a pair) to reflect that it
now returns a triple: (top-k logprobs, top-k indices, teacher_scored_text);
specifically, change the Returns section to list all three elements and briefly
describe each item (top_k logprobs, top_k indices per response token, and
teacher_scored_text containing the teacher's scored/annotated text for the
response) so the documentation matches the updated return value.
🧹 Nitpick comments (2)
claas/training/engine/tinker/engine.py (1)

351-360: Type annotation dict[str, float | str] omits int for completion_len.

completion_len is assigned an int (line 280), which falls outside float | str. Consider widening to dict[str, float | int | str] for precision.

🔧 Proposed fix
-    metrics: dict[str, float | str] = {
+    metrics: dict[str, float | int | str] = {

And correspondingly update the function return type annotation at line 269:

-) -> tuple[T.Datum, dict[str, float | str]]:
+) -> tuple[T.Datum, dict[str, float | int | str]]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/training/engine/tinker/engine.py` around lines 351 - 360, The metrics
dict's type annotation only allows float | str but completion_len is an int;
change the annotation for the "metrics" variable from dict[str, float | str] to
dict[str, float | int | str] and also update the enclosing function's return
type annotation (the function that defines metrics and currently declares its
return type at the top of the function) to include int alongside float and str
so the returned metrics type matches completion_len's actual type.
claas/training/distillation.py (1)

199-215: GPU→CPU sync via .tolist() occurs before the forward pass.

Line 200 calls teacher_full_ids[0].tolist(), which forces a CUDA device sync at that point. The forward pass (self.base_model(input_ids=teacher_full_ids)) at line 205 still needs the GPU tensor and can't start until the sync completes. Moving the decode to after the with torch.no_grad() block but before del teacher_full_ids allows the GPU to launch the forward pass without waiting for the CPU roundtrip.

⚡ Proposed reordering
         teacher_full_ids = torch.cat([teacher_prompt_ids, response_ids], dim=-1)
-        teacher_scored_text = self.tokenizer.decode(teacher_full_ids[0].tolist(), skip_special_tokens=False)
         teacher_resp_start = teacher_prompt_ids.shape[-1]
         response_token_count = response_ids.shape[-1]

         with torch.no_grad():
             teacher_output = self.base_model(input_ids=teacher_full_ids)
             teacher_logits = teacher_output.logits[:, teacher_resp_start - 1 : -1, :]
             log_probs = self.functional.log_softmax(teacher_logits, dim=-1)
             vocab_size = log_probs.shape[-1]
             k = min(max(1, top_k), vocab_size)
             top_logprobs, top_indices = torch.topk(log_probs[0, :response_token_count], k=k, dim=-1)

         del teacher_output, teacher_logits, log_probs
+        teacher_scored_text = self.tokenizer.decode(teacher_full_ids[0].tolist(), skip_special_tokens=False)
         del teacher_full_ids, teacher_prompt_ids
         torch.cuda.empty_cache()
         return top_logprobs, top_indices, teacher_scored_text
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/training/distillation.py` around lines 199 - 215, The current call to
tokenizer.decode(teacher_full_ids[0].tolist()) forces a CUDA->CPU sync before
running the forward pass; move the decode of teacher_full_ids into the no_grad
block after running self.base_model(...) (i.e., perform teacher_output =
self.base_model(input_ids=teacher_full_ids) and compute
teacher_logits/log_probs/topk first, then call teacher_scored_text =
self.tokenizer.decode(teacher_full_ids[0].tolist(), skip_special_tokens=False)
before deleting teacher_full_ids) so the forward can launch on GPU without
waiting for the tolist() CPU roundtrip.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@claas/training/distillation.py`:
- Around line 171-181: Update the docstring for the function that "Build top-k
teacher logits from the frozen base model" (the function with the current
Returns documenting a pair) to reflect that it now returns a triple: (top-k
logprobs, top-k indices, teacher_scored_text); specifically, change the Returns
section to list all three elements and briefly describe each item (top_k
logprobs, top_k indices per response token, and teacher_scored_text containing
the teacher's scored/annotated text for the response) so the documentation
matches the updated return value.

---

Nitpick comments:
In `@claas/training/distillation.py`:
- Around line 199-215: The current call to
tokenizer.decode(teacher_full_ids[0].tolist()) forces a CUDA->CPU sync before
running the forward pass; move the decode of teacher_full_ids into the no_grad
block after running self.base_model(...) (i.e., perform teacher_output =
self.base_model(input_ids=teacher_full_ids) and compute
teacher_logits/log_probs/topk first, then call teacher_scored_text =
self.tokenizer.decode(teacher_full_ids[0].tolist(), skip_special_tokens=False)
before deleting teacher_full_ids) so the forward can launch on GPU without
waiting for the tolist() CPU roundtrip.

In `@claas/training/engine/tinker/engine.py`:
- Around line 351-360: The metrics dict's type annotation only allows float |
str but completion_len is an int; change the annotation for the "metrics"
variable from dict[str, float | str] to dict[str, float | int | str] and also
update the enclosing function's return type annotation (the function that
defines metrics and currently declares its return type at the top of the
function) to include int alongside float and str so the returned metrics type
matches completion_len's actual type.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between cbf1916 and a380089.

📒 Files selected for processing (5)
  • claas/dashboard/rendering.py
  • claas/training/distillation.py
  • claas/training/engine/tinker/engine.py
  • tests/test_api.py
  • tests/test_tinker_engine.py

Kion and others added 2 commits February 23, 2026 16:33
Match the hindsight context block from the online SDPO trainer in
lasgroup/user_interactions to use the paper's exact framing:
"=== HINDSIGHT CONTEXT ===" with "future user message" phrasing
instead of the previous "feedback from your earlier attempt" wording.

Add paper and repo references to README and source docstrings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The teacher was always scoring under the generic DEFAULT_SYSTEM_PROMPT
("You are a helpful assistant.") regardless of what system prompt the
student saw. This corrupted the SDPO advantage signal when requests
routed through OpenClaw with a rich agent system prompt.

- Store system_prompt in CompletionCacheEntry from chat completion requests
- Make system_prompt a required field on DistillBatchItem (no default)
- Make system_prompt a required kwarg on build_teacher_messages, removing
  the DEFAULT_SYSTEM_PROMPT fallback
- Validate system_prompt presence in /v1/feedback (422 if missing)
- Thread system_prompt through distillation.py and tinker/engine.py
- Update all tests and JSON fixtures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kfallah
Copy link
Owner Author

kfallah commented Feb 24, 2026

@codex review

response_token_ids: list[int],
prompt_token_ids: list[int],
response_logprobs: list[float] | None,
system_prompt: str | None = None,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be optional

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a7e23bcsystem_prompt is now str (required, no default) in CompletionCacheEntry. The API defaults to "" when no system messages are present, and the feedback endpoint rejects empty system prompts with a 422.

Comment generated by Claude Code

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
claas/training/engine/tinker/engine.py (2)

292-292: Long line (>120 chars) — consider wrapping.

-    teacher_messages = build_teacher_messages(teacher_prompt_source, sample.feedback, system_prompt=sample.system_prompt)
+    teacher_messages = build_teacher_messages(
+        teacher_prompt_source, sample.feedback, system_prompt=sample.system_prompt
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/training/engine/tinker/engine.py` at line 292, The call to
build_teacher_messages is longer than 120 chars; rewrite it to wrap arguments
across multiple lines so it stays under the line-length limit. Locate the call
to build_teacher_messages (using teacher_prompt_source, sample.feedback,
system_prompt=sample.system_prompt) and format it like a multi-line function
call with each argument on its own indented line, preserving the same argument
order and names.

237-255: Widening the metrics dict to float | str weakens type safety on aggregation helpers.

The avg lambda (Line 238) and sum(m["completion_len"] ...) (Line 237) operate on dict[str, float | str] values, so the type checker can't guarantee they receive numerics. This won't break at runtime (only numeric keys are passed), but it may produce type-checker warnings.

A clean alternative: keep the metrics dict as dict[str, float] and return teacher_scored_text as a separate value in the tuple.

♻️ Suggested refactor
 async def _build_sample_datum(
     sample: DistillBatchItem,
     tokenizer: Any,
     teacher_sampling: Any,
     kl_coef: float,
-) -> tuple[T.Datum, dict[str, float | str]]:
+) -> tuple[T.Datum, dict[str, float], str]:

Then unpack in distill():

-        datums = [r[0] for r in results]
-        sample_metrics = [r[1] for r in results]
+        datums = [r[0] for r in results]
+        sample_metrics = [r[1] for r in results]
+        teacher_scored_texts = [r[2] for r in results]

And use teacher_scored_texts directly in the metadata instead of extracting from sample_metrics.

Also applies to: 351-359

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/training/engine/tinker/engine.py` around lines 237 - 255, The metrics
dict is currently typed to allow strings which weakens type checking for numeric
aggregations (used by the avg lambda and sum over sample_metrics); change the
producer that builds sample_metrics (the function that returns sample_metrics
inside distill()/wherever it is created) to return a tuple of (list[dict[str,
float]] metrics, list[str] teacher_scored_texts) so sample_metrics can be typed
dict[str, float] and aggregations are type-safe, then unpack that tuple where
used (in distill/around the use of sample_metrics and the metadata block) and
set metadata["teacher_scored_texts"] = teacher_scored_texts; apply the same
refactor for the similar block around lines 351-359 to keep numeric metrics
strictly float and teacher text separate.
claas/api.py (1)

235-236: Multiple system messages are silently concatenated.

If a request contains more than one system message, they're joined with \n. This is a reasonable default, but it may silently alter the system prompt the teacher sees compared to what the model actually received (some backends treat multiple system messages differently). Consider whether a single-system-message constraint or a log warning would be appropriate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/api.py` around lines 235 - 236, The code currently concatenates
multiple system messages into system_prompt by building system_parts from
messages and joining them; change this to detect when len(system_parts) > 1 and
handle it explicitly: either enforce a single system message by raising a clear
error (ValueError) or emit a visible warning via the module logger (e.g.,
logger.warning) before keeping the existing join behavior; update the logic
around system_parts/system_prompt to perform this check and provide a clear
message referencing messages, system_parts and system_prompt so callers see when
multiple system messages are present.
claas/training/distillation.py (1)

202-202: teacher_scored_text can be large — consider truncation for log storage.

The full decoded teacher prompt + response is stored per-sample in the metadata and ultimately written to feedback logs. For long conversations this could substantially inflate log sizes. A truncated preview (e.g., first/last N characters) might be more practical for the dashboard, with the full text available on demand.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/training/distillation.py` at line 202, Replace the direct full decode
assigned to teacher_scored_text (the line using
self.tokenizer.decode(teacher_full_ids[0].tolist(), ...)) with logic that
decodes the full text into a separate variable (e.g., teacher_full_text =
self.tokenizer.decode(...)) and then creates a truncated preview (e.g.,
teacher_scored_text = truncate_preview(teacher_full_text, max_chars=200) which
keeps the first/last N chars with an ellipsis). Keep teacher_full_text available
if needed elsewhere (store as teacher_full_text or only persist it on demand)
and use teacher_scored_text (the preview) for metadata/log writing to avoid
bloating logs.
tests/test_teacher_helpers.py (1)

33-40: test_system_prompt_matches_input largely duplicates test_custom_system_prompt.

Both tests verify that msgs[0]["content"] equals the provided system_prompt — they differ only in the string value used. Consider consolidating into a single parametrized test if you want to keep both cases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_teacher_helpers.py` around lines 33 - 40, The two tests
test_custom_system_prompt and test_system_prompt_matches_input duplicate
coverage of build_teacher_messages for system_prompt; consolidate them into a
single parametrized test or remove one. Update the tests to call
build_teacher_messages("x", system_prompt=...) with multiple inputs (e.g., "Be
concise." and the pirate string) using pytest.mark.parametrize or keep only the
more descriptive test_system_prompt_matches_input, ensuring the assertion still
checks msgs[0]["content"] equals the provided system_prompt.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@claas/api.py`:
- Around line 235-236: The code currently concatenates multiple system messages
into system_prompt by building system_parts from messages and joining them;
change this to detect when len(system_parts) > 1 and handle it explicitly:
either enforce a single system message by raising a clear error (ValueError) or
emit a visible warning via the module logger (e.g., logger.warning) before
keeping the existing join behavior; update the logic around
system_parts/system_prompt to perform this check and provide a clear message
referencing messages, system_parts and system_prompt so callers see when
multiple system messages are present.

In `@claas/training/distillation.py`:
- Line 202: Replace the direct full decode assigned to teacher_scored_text (the
line using self.tokenizer.decode(teacher_full_ids[0].tolist(), ...)) with logic
that decodes the full text into a separate variable (e.g., teacher_full_text =
self.tokenizer.decode(...)) and then creates a truncated preview (e.g.,
teacher_scored_text = truncate_preview(teacher_full_text, max_chars=200) which
keeps the first/last N chars with an ellipsis). Keep teacher_full_text available
if needed elsewhere (store as teacher_full_text or only persist it on demand)
and use teacher_scored_text (the preview) for metadata/log writing to avoid
bloating logs.

In `@claas/training/engine/tinker/engine.py`:
- Line 292: The call to build_teacher_messages is longer than 120 chars; rewrite
it to wrap arguments across multiple lines so it stays under the line-length
limit. Locate the call to build_teacher_messages (using teacher_prompt_source,
sample.feedback, system_prompt=sample.system_prompt) and format it like a
multi-line function call with each argument on its own indented line, preserving
the same argument order and names.
- Around line 237-255: The metrics dict is currently typed to allow strings
which weakens type checking for numeric aggregations (used by the avg lambda and
sum over sample_metrics); change the producer that builds sample_metrics (the
function that returns sample_metrics inside distill()/wherever it is created) to
return a tuple of (list[dict[str, float]] metrics, list[str]
teacher_scored_texts) so sample_metrics can be typed dict[str, float] and
aggregations are type-safe, then unpack that tuple where used (in distill/around
the use of sample_metrics and the metadata block) and set
metadata["teacher_scored_texts"] = teacher_scored_texts; apply the same refactor
for the similar block around lines 351-359 to keep numeric metrics strictly
float and teacher text separate.

In `@tests/test_teacher_helpers.py`:
- Around line 33-40: The two tests test_custom_system_prompt and
test_system_prompt_matches_input duplicate coverage of build_teacher_messages
for system_prompt; consolidate them into a single parametrized test or remove
one. Update the tests to call build_teacher_messages("x", system_prompt=...)
with multiple inputs (e.g., "Be concise." and the pirate string) using
pytest.mark.parametrize or keep only the more descriptive
test_system_prompt_matches_input, ensuring the assertion still checks
msgs[0]["content"] equals the provided system_prompt.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between a380089 and 91af7c4.

📒 Files selected for processing (13)
  • README.md
  • claas/api.py
  • claas/core/types.py
  • claas/inference/cache.py
  • claas/training/distillation.py
  • claas/training/engine/tinker/engine.py
  • claas/training/teacher_helpers.py
  • tests/integration/test_local_engine_integration.py
  • tests/integration/test_tinker_stack_integration.py
  • tests/test_api.py
  • tests/test_local_training_engine.py
  • tests/test_teacher_helpers.py
  • tests/test_tinker_engine.py

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91af7c491a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Make system_prompt required (str, not optional) in CompletionCacheEntry
- Fix docstring: "Pair" → "Triple" for _build_self_teacher_topk return
- Move tokenizer.decode after forward pass to avoid CUDA→CPU sync stall
- Wrap long build_teacher_messages call in tinker engine
- Consolidate duplicate system_prompt tests into parametrized test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kfallah
Copy link
Owner Author

kfallah commented Feb 24, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a7e23bc106

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"not a nested chat template."
),
)
system_prompt: str = Field(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep DistillBatchItem compatible with older feedback logs

Making system_prompt required causes previously written feedback log files (which do not have this field in batch_samples) to fail FeedbackLogRecord.model_validate(...); read_recent_feedback_logs then silently skips those files, so /v1/dashboard can lose historical records immediately after deploy. Adding a default/optional value here (or an explicit migration path) would preserve dashboard visibility for pre-existing logs.

Useful? React with 👍 / 👎.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ensures OpenClaw and API logs are captured on failure,
since the cleanup trap was destroying containers before
the CI log-dump step could read them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Latest openclaw requires gateway.controlUi.allowedOrigins when
binding to non-loopback. We don't need the Control UI in Docker,
so disable it to prevent gateway startup failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kfallah kfallah merged commit 600a31d into main Feb 24, 2026
5 checks passed
@kfallah kfallah deleted the fix/sdpo-teacher-prompt-alignment branch February 24, 2026 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant